[VPlan] Materialize vector trip count using VPInstructions. #151925

fhahn · 2025-08-04T09:13:28Z

Materialize the vector trip count computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. It also simplifies vector-trip count computations for scalable vectors, as we can re-use the UF x VF computation.

llvmbot · 2025-08-04T09:14:05Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-risc-v

Author: Florian Hahn (fhahn)

Changes

Materialize the vector trip count computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. It also simplifies vector-trip count computations for scalable vectors, as we can re-use the UF x VF computation.

The underlying value of the vector trip count VPValue needs to be re-set the generated IR value for the vector trip count to keep legacy epilogue skeleton generation working.

But once the full skeleton is modeled in VPlan, we will be able to also handle the epilogue skeleton in VPlan.

Patch is 1.19 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/151925.diff

135 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+13-67)
(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+12-8)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+1-1)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+57-1)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+10-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll (+4-16)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll (+1-10)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll (+5-19)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll (+5-7)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/gather-do-not-vectorize-addressing.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll (+2-19)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/interleave-with-gaps.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/interleaving-reduction.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/low_trip_count_predicates.ll (+5-19)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll (+10-26)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll (+6-24)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/outer_loop_prefer_scalable.ll (+23-25)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll (+18-54)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-mixed.ll (+4-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product.ll (+35-111)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub.ll (+3-9)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce.ll (+11-45)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/pr151664-cost-hoisted-vector-scalable.ll (+1-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/pr60831-sve-inv-store-crash.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/pr73894.ll (-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/reduction-recurrence-costs-sve.ll (+6-24)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-reduction-inloop-cond.ll (+2-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll (+675-747)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll (+2-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/single-early-exit-interleave.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/store-costs-sve.ll (+4-20)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/struct-return-cost.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-cond-inv-loads.ll (+6-9)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-inloop-reductions.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-reductions.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-strict-reductions.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll (+16-48)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-fneg.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-gather-scatter.ll (+10-16)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll (+19-38)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll (+177-185)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-inv-store.ll (+2-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-live-out-pointer-induction.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-low-trip-count.ll (+49-20)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-multiexit.ll (+2-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-runtime-check-size-based-threshold.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll (-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll (+1-7)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-reductions.ll (+2-38)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-unroll.ll (-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll (+14-73)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-vscale-based-trip-counts.ll (+5-19)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll (+6-9)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve2-histcnt-epilogue.ll (+2-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve2-histcnt-too-many-deps.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve2-histcnt.ll (+20-30)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/tail-folding-styles.ll (+5-23)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/vplan-printing.ll (+6-5)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/wider-VF-for-callinst.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll (+2-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/blend-any-of-reduction-cost.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/blocks-with-dead-instructions.ll (+16-34)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/dead-ops-cost.ll (+8-16)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/defaults.ll (+2-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll (+9-27)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll (-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/f16.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/first-order-recurrence-scalable-vf1.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/fminimumnum.ll (+12-36)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/induction-costs.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll (+6-38)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll (+22-58)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-masked-access.ll (+112-120)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/lmul.ll (+4-11)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/low-trip-count.ll (-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/mask-index-type.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/masked_gather_scatter.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/partial-reduce-dot-product.ll (+8-24)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/pr87378-vpinstruction-or-drop-poison-generating-flags.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/pr88802.ll (-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/reductions.ll (+16-48)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/remark-reductions.ll (+2-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (+58-82)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/safe-dep-distance.ll (+3-9)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/scalable-basics.ll (+6-18)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/scalable-tailfold.ll (+1-39)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll (+6-18)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll (+15-33)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-bin-unary-ops-args.ll (+18-162)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-call-intrinsics.ll (+9-81)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-cast-intrinsics.ll (+12-100)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-cond-reduction.ll (+12-76)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-div.ll (+4-36)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-fixed-order-recurrence.ll (+15-51)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-gather-scatter.ll (+3-11)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-inloop-reduction.ll (+28-140)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-interleave.ll (+2-10)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-intermediate-store.ll (+2-18)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-iv32.ll (+2-10)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-known-no-overflow.ll (-18)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-masked-loadstore.ll (+2-10)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-ordered-reduction.ll (+2-10)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reduction.ll (+28-140)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reverse-load-store.ll (+10-32)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-safe-dep-distance.ll (+2-23)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-uniform-store.ll (-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-cost.ll (+3-15)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll (-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll (-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+11-71)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vector-loop-backedge-elimination-with-evl.ll (-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll (+2-10)
(modified) llvm/test/Transforms/LoopVectorize/X86/scev-checks-unprofitable.ll (-3)
(modified) llvm/test/Transforms/LoopVectorize/dbg-outer-loop-vect.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-scalable-vf1.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/outer_loop_scalable.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/runtime-check.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll (+14-38)
(modified) llvm/test/Transforms/LoopVectorize/scalable-inductions.ll (+9-17)
(modified) llvm/test/Transforms/LoopVectorize/scalable-iv-outside-user.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/scalable-lifetime.ll (+13-17)
(modified) llvm/test/Transforms/LoopVectorize/scalable-loop-unpredicated-body-scalar-tail.ll (-4)
(modified) llvm/test/Transforms/LoopVectorize/scalable-predication.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/scalable-reduction-inloop.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/scalable-trunc-min-bitwidth.ll (+2-6)
(modified) llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination.ll (-18)
(modified) llvm/test/Transforms/LoopVectorize/vectorize-force-tail-with-evl.ll (+2-5)
(modified) llvm/test/Transforms/LoopVectorize/vplan-iv-transforms.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/vplan-predicate-switch.ll (+5-6)
(modified) llvm/test/Transforms/LoopVectorize/vplan-printing-before-execute.ll (+7-7)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index d04317bd8822d..1175c9f277cc0 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -548,9 +548,6 @@ class InnerLoopVectorizer {
 protected:
   friend class LoopVectorizationPlanner;
 
-  /// Returns (and creates if needed) the trip count of the widened loop.
-  Value *getOrCreateVectorTripCount(BasicBlock *InsertBlock);
-
   // Create a check to see if the vector loop should be executed
   Value *createIterationCountCheck(ElementCount VF, unsigned UF) const;
 
@@ -602,6 +599,7 @@ class InnerLoopVectorizer {
   /// vector elements.
   ElementCount VF;
 
+public:
   ElementCount MinProfitableTripCount;
 
   /// The vectorization unroll factor to use. Each scalar is vectorized to this
@@ -2279,56 +2277,6 @@ static bool useMaskedInterleavedAccesses(const TargetTransformInfo &TTI) {
   return TTI.enableMaskedInterleavedAccessVectorization();
 }
 
-Value *
-InnerLoopVectorizer::getOrCreateVectorTripCount(BasicBlock *InsertBlock) {
-  if (VectorTripCount)
-    return VectorTripCount;
-
-  Value *TC = getTripCount();
-  IRBuilder<> Builder(InsertBlock->getTerminator());
-
-  Type *Ty = TC->getType();
-  // This is where we can make the step a runtime constant.
-  Value *Step = createStepForVF(Builder, Ty, VF, UF);
-
-  // If the tail is to be folded by masking, round the number of iterations N
-  // up to a multiple of Step instead of rounding down. This is done by first
-  // adding Step-1 and then rounding down. Note that it's ok if this addition
-  // overflows: the vector induction variable will eventually wrap to zero given
-  // that it starts at zero and its Step is a power of two; the loop will then
-  // exit, with the last early-exit vector comparison also producing all-true.
-  // For scalable vectors the VF is not guaranteed to be a power of 2, but this
-  // is accounted for in emitIterationCountCheck that adds an overflow check.
-  if (Cost->foldTailByMasking()) {
-    assert(isPowerOf2_32(VF.getKnownMinValue() * UF) &&
-           "VF*UF must be a power of 2 when folding tail by masking");
-    TC = Builder.CreateAdd(TC, Builder.CreateSub(Step, ConstantInt::get(Ty, 1)),
-                           "n.rnd.up");
-  }
-
-  // Now we need to generate the expression for the part of the loop that the
-  // vectorized body will execute. This is equal to N - (N % Step) if scalar
-  // iterations are not required for correctness, or N - Step, otherwise. Step
-  // is equal to the vectorization factor (number of SIMD elements) times the
-  // unroll factor (number of SIMD instructions).
-  Value *R = Builder.CreateURem(TC, Step, "n.mod.vf");
-
-  // There are cases where we *must* run at least one iteration in the remainder
-  // loop.  See the cost model for when this can happen.  If the step evenly
-  // divides the trip count, we set the remainder to be equal to the step. If
-  // the step does not evenly divide the trip count, no adjustment is necessary
-  // since there will already be scalar iterations. Note that the minimum
-  // iterations check ensures that N >= Step.
-  if (Cost->requiresScalarEpilogue(VF.isVector())) {
-    auto *IsZero = Builder.CreateICmpEQ(R, ConstantInt::get(R->getType(), 0));
-    R = Builder.CreateSelect(IsZero, Step, R);
-  }
-
-  VectorTripCount = Builder.CreateSub(TC, R, "n.vec");
-
-  return VectorTripCount;
-}
-
 void InnerLoopVectorizer::introduceCheckBlockInVPlan(BasicBlock *CheckIRBB) {
   // Note: The block with the minimum trip-count check is already connected
   // during earlier VPlan construction.
@@ -7319,6 +7267,9 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   // Canonicalize EVL loops after regions are dissolved.
   VPlanTransforms::canonicalizeEVLLoops(BestVPlan);
   VPlanTransforms::materializeBackedgeTakenCount(BestVPlan, VectorPH);
+  VPlanTransforms::materializeVectorTripCount(
+      BestVPlan, VectorPH, CM.foldTailByMasking(),
+      CM.requiresScalarEpilogue(BestVF.isVector()));
 
   // Perform the actual loop transformation.
   VPTransformState State(&TTI, BestVF, LI, DT, ILV.AC, ILV.Builder, &BestVPlan,
@@ -7375,8 +7326,7 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   //===------------------------------------------------===//
 
   // 2. Copy and widen instructions from the old loop into the new loop.
-  BestVPlan.prepareToExecute(
-      ILV.getOrCreateVectorTripCount(ILV.LoopVectorPreHeader), State);
+  BestVPlan.prepareToExecute(State);
   replaceVPBBWithIRVPBB(VectorPH, State.CFG.PrevBB);
 
   // Move check blocks to their final position.
@@ -7496,7 +7446,6 @@ BasicBlock *EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton() {
       emitIterationCountCheck(LoopScalarPreHeader, false);
 
   // Generate the induction variable.
-  EPI.VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);
 
   replaceVPBBWithIRVPBB(Plan.getScalarPreheader(), LoopScalarPreHeader);
   return LoopVectorPreHeader;
@@ -9376,13 +9325,6 @@ void VPDerivedIVRecipe::execute(VPTransformState &State) {
       State.Builder, Index, getStartValue()->getLiveInIRValue(), Step, Kind,
       cast_if_present<BinaryOperator>(FPBinOp));
   DerivedIV->setName(Name);
-  // If index is the vector trip count, the concrete value will only be set in
-  // prepareToExecute, leading to missed simplifications, e.g. if it is 0.
-  // TODO: Remove the special case for the vector trip count once it is computed
-  // in VPlan and can be used during VPlan simplification.
-  assert((DerivedIV != Index ||
-          getOperand(1) == &getParent()->getPlan()->getVectorTripCount()) &&
-         "IV didn't need transforming?");
   State.set(this, DerivedIV, VPLane(0));
 }
 
@@ -10272,8 +10214,9 @@ bool LoopVectorizePass::processLoop(Loop *L) {
 
       // TODO: Move to general VPlan pipeline once epilogue loops are also
       // supported.
-      VPlanTransforms::runPass(VPlanTransforms::materializeVectorTripCount,
-                               BestPlan, VF.Width, IC, PSE);
+      VPlanTransforms::runPass(
+          VPlanTransforms::materializeConstantVectorTripCount, BestPlan,
+          VF.Width, IC, PSE);
 
       LVP.executePlan(VF.Width, IC, BestPlan, Unroller, DT, false);
 
@@ -10312,6 +10255,8 @@ bool LoopVectorizePass::processLoop(Loop *L) {
         // edges from the first pass.
         EPI.MainLoopVF = EPI.EpilogueVF;
         EPI.MainLoopUF = EPI.EpilogueUF;
+        EPI.VectorTripCount =
+            BestMainPlan->getVectorTripCount().getLiveInIRValue();
         EpilogueVectorizerEpilogueLoop EpilogILV(L, PSE, LI, DT, TLI, TTI, AC,
                                                  ORE, EPI, &CM, BFI, PSI,
                                                  Checks, BestEpiPlan);
@@ -10344,8 +10289,9 @@ bool LoopVectorizePass::processLoop(Loop *L) {
                                Checks, BestPlan);
         // TODO: Move to general VPlan pipeline once epilogue loops are also
         // supported.
-        VPlanTransforms::runPass(VPlanTransforms::materializeVectorTripCount,
-                                 BestPlan, VF.Width, IC, PSE);
+        VPlanTransforms::runPass(
+            VPlanTransforms::materializeConstantVectorTripCount, BestPlan,
+            VF.Width, IC, PSE);
 
         LVP.executePlan(VF.Width, IC, BestPlan, LB, DT, false);
         ++LoopsVectorized;
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 2138b4154d2c2..4038a120d6bfe 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -951,15 +951,9 @@ VPlan::~VPlan() {
     delete BackedgeTakenCount;
 }
 
-void VPlan::prepareToExecute(Value *VectorTripCountV, VPTransformState &State) {
-  if (!VectorTripCount.getUnderlyingValue())
-    VectorTripCount.setUnderlyingValue(VectorTripCountV);
-  else
-    assert(VectorTripCount.getUnderlyingValue() == VectorTripCountV &&
-           "VectorTripCount set earlier must much VectorTripCountV");
-
+void VPlan::prepareToExecute(VPTransformState &State) {
   IRBuilder<> Builder(State.CFG.PrevBB->getTerminator());
-  Type *TCTy = VectorTripCountV->getType();
+  Type *TCTy = VPTypeAnalysis(*this).inferScalarType(getTripCount());
   // FIXME: Model VF * UF computation completely in VPlan.
   unsigned UF = getUF();
   if (VF.getNumUsers()) {
@@ -1023,6 +1017,16 @@ void VPlan::execute(VPTransformState *State) {
     Block->execute(State);
 
   State->CFG.DTU.flush();
+  auto *ScalarPH = getScalarPreheader();
+
+  // Set the underlying value of the vector trip count VPValue to the generated
+  // vector trip count by accessing the incoming value from the resume phi.
+  // TODO: This is needed, as the epilogue skeleton generation code needs to
+  // access the IR value. Remove once skeleton is modeled completely in VPlan.
+  if (!getVectorTripCount().getLiveInIRValue() &&
+      ScalarPH->getNumPredecessors() > 0)
+    getVectorTripCount().setUnderlyingValue(
+        State->get(cast<VPPhi>(&*ScalarPH->begin())->getOperand(0), true));
 
   VPBasicBlock *Header = vputils::getFirstLoopHeader(*this, State->VPDT);
   if (!Header)
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 6f547a31f4b9f..3084d50683644 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -3958,7 +3958,7 @@ class VPlan {
   }
 
   /// Prepare the plan for execution, setting up the required live-in values.
-  void prepareToExecute(Value *VectorTripCount, VPTransformState &State);
+  void prepareToExecute(VPTransformState &State);
 
   /// Generate the IR code for this VPlan.
   void execute(VPTransformState *State);
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 3ecffc7593d49..fb3300d94ff1f 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -3153,7 +3153,7 @@ void VPlanTransforms::materializeBroadcasts(VPlan &Plan) {
   }
 }
 
-void VPlanTransforms::materializeVectorTripCount(
+void VPlanTransforms::materializeConstantVectorTripCount(
     VPlan &Plan, ElementCount BestVF, unsigned BestUF,
     PredicatedScalarEvolution &PSE) {
   assert(Plan.hasVF(BestVF) && "BestVF is not available in Plan");
@@ -3191,6 +3191,62 @@ void VPlanTransforms::materializeBackedgeTakenCount(VPlan &Plan,
   BTC->replaceAllUsesWith(TCMO);
 }
 
+void VPlanTransforms::materializeVectorTripCount(VPlan &Plan,
+                                                 VPBasicBlock *VectorPHVPBB,
+                                                 bool TailByMasking,
+                                                 bool RequiresScalarEpilogue) {
+  VPValue &VectorTC = Plan.getVectorTripCount();
+  if (VectorTC.getNumUsers() == 0 ||
+      (VectorTC.isLiveIn() && VectorTC.getLiveInIRValue()))
+    return;
+  VPValue *TC = Plan.getTripCount();
+  Type *TCTy = VPTypeAnalysis(Plan).inferScalarType(TC);
+  VPBuilder Builder(VectorPHVPBB, VectorPHVPBB->begin());
+
+  VPValue *Step = &Plan.getVFxUF();
+
+  // If the tail is to be folded by masking, round the number of iterations N
+  // up to a multiple of Step instead of rounding down. This is done by first
+  // adding Step-1 and then rounding down. Note that it's ok if this addition
+  // overflows: the vector induction variable will eventually wrap to zero given
+  // that it starts at zero and its Step is a power of two; the loop will then
+  // exit, with the last early-exit vector comparison also producing all-true.
+  // For scalable vectors the VF is not guaranteed to be a power of 2, but this
+  // is accounted for in emitIterationCountCheck that adds an overflow check.
+  if (TailByMasking) {
+    TC = Builder.createNaryOp(
+        Instruction::Add,
+        {TC, Builder.createNaryOp(
+                 Instruction::Sub,
+                 {Step, Plan.getOrAddLiveIn(ConstantInt::get(TCTy, 1))})},
+        DebugLoc::getUnknown(), "n.rnd.up");
+  }
+
+  // Now we need to generate the expression for the part of the loop that the
+  // vectorized body will execute. This is equal to N - (N % Step) if scalar
+  // iterations are not required for correctness, or N - Step, otherwise. Step
+  // is equal to the vectorization factor (number of SIMD elements) times the
+  // unroll factor (number of SIMD instructions).
+  VPValue *R = Builder.createNaryOp(Instruction::URem, {TC, Step},
+                                    DebugLoc::getUnknown(), "n.mod.vf");
+
+  // There are cases where we *must* run at least one iteration in the remainder
+  // loop.  See the cost model for when this can happen.  If the step evenly
+  // divides the trip count, we set the remainder to be equal to the step. If
+  // the step does not evenly divide the trip count, no adjustment is necessary
+  // since there will already be scalar iterations. Note that the minimum
+  // iterations check ensures that N >= Step.
+  if (RequiresScalarEpilogue) {
+    auto *IsZero = Builder.createICmp(
+        CmpInst::ICMP_EQ, R, Plan.getOrAddLiveIn(ConstantInt::get(TCTy, 0)));
+    R = Builder.createSelect(IsZero, Step, R);
+  }
+
+  auto Res = Builder.createNaryOp(Instruction::Sub, {TC, R},
+                                  DebugLoc::getUnknown(), "n.vec");
+  Plan.getVectorTripCount().replaceAllUsesWith(Res);
+}
+
 /// Returns true if \p V is VPWidenLoadRecipe or VPInterleaveRecipe that can be
 /// converted to a narrower recipe. \p V is used by a wide recipe that feeds a
 /// store interleave group at index \p Idx, \p WideMember0 is the recipe feeding
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index 5943684e17a76..2a33fc8a6621a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -252,9 +252,16 @@ struct VPlanTransforms {
 
   // Materialize vector trip counts for constants early if it can simply be
   // computed as (Original TC / VF * UF) * VF * UF.
-  static void materializeVectorTripCount(VPlan &Plan, ElementCount BestVF,
-                                         unsigned BestUF,
-                                         PredicatedScalarEvolution &PSE);
+  static void
+  materializeConstantVectorTripCount(VPlan &Plan, ElementCount BestVF,
+                                     unsigned BestUF,
+                                     PredicatedScalarEvolution &PSE);
+
+  // Materialize vector trip count computations to a set of VPInstructions.
+  static void materializeVectorTripCount(VPlan &Plan,
+                                         VPBasicBlock *VectorPHVPBB,
+                                         bool TailByMasking,
+                                         bool RequiresScalarEpilogue);
 
   /// Materialize the backedge-taken count to be computed explicitly using
   /// VPInstructions.
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll b/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
index 795de3d978e74..04ff5fe37a6df 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
@@ -9,19 +9,13 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
 ; CHECK:       vector.ph:
 ; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP1:%.*]] = mul nuw i64 [[TMP0]], 8
-; CHECK-NEXT:    [[TMP4:%.*]] = sub i64 [[TMP1]], 1
-; CHECK-NEXT:    [[N_RND_UP:%.*]] = add i64 8, [[TMP4]]
-; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
-; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
-; CHECK-NEXT:    [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT:    [[TMP6:%.*]] = mul nuw i64 [[TMP5]], 8
 ; CHECK-NEXT:    [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 0, i64 8)
 ; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 8 x i64> poison, i64 [[VAL]], i64 0
 ; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 8 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer
 ; CHECK-NEXT:    [[TMP8:%.*]] = call <vscale x 8 x i64> @llvm.stepvector.nxv8i64()
 ; CHECK-NEXT:    [[TMP7:%.*]] = mul <vscale x 8 x i64> [[TMP8]], splat (i64 1)
 ; CHECK-NEXT:    [[INDUCTION:%.*]] = add <vscale x 8 x i64> zeroinitializer, [[TMP7]]
-; CHECK-NEXT:    [[TMP12:%.*]] = mul i64 1, [[TMP6]]
+; CHECK-NEXT:    [[TMP12:%.*]] = mul i64 1, [[TMP1]]
 ; CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 8 x i64> poison, i64 [[TMP12]], i64 0
 ; CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <vscale x 8 x i64> [[DOTSPLATINSERT]], <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
@@ -34,7 +28,7 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
 ; CHECK-NEXT:    [[TMP11:%.*]] = lshr <vscale x 8 x i64> [[BROADCAST_SPLAT]], [[TMP10]]
 ; CHECK-NEXT:    [[TMP14:%.*]] = trunc <vscale x 8 x i64> [[TMP11]] to <vscale x 8 x i8>
 ; CHECK-NEXT:    call void @llvm.masked.store.nxv8i8.p0(<vscale x 8 x i8> [[TMP14]], ptr [[NEXT_GEP]], i32 1, <vscale x 8 x i1> [[ACTIVE_LANE_MASK]])
-; CHECK-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP6]]
+; CHECK-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP1]]
 ; CHECK-NEXT:    [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 [[INDEX_NEXT]], i64 8)
 ; CHECK-NEXT:    [[VEC_IND_NEXT]] = add <vscale x 8 x i64> [[VEC_IND]], [[DOTSPLAT]]
 ; CHECK-NEXT:    br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
@@ -92,19 +86,13 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range
 ; CHECK:       vector.ph:
 ; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP1:%.*]] = mul nuw i64 [[TMP0]], 8
-; CHECK-NEXT:    [[TMP4:%.*]] = sub i64 [[TMP1]], 1
-; CHECK-NEXT:    [[N_RND_UP:%.*]] = add i64 [[WIDE_TRIP_COUNT]], [[TMP4]]
-; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
-; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
-; CHECK-NEXT:    [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT:    [[TMP6:%.*]] = mul nuw i64 [[TMP5]], 8
 ; CHECK-NEXT:    [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 0, i64 [[WIDE_TRIP_COUNT]])
 ; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 8 x i64> poison, i64 [[VAL]], i64 0
 ; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 8 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer
 ; CHECK-NEXT:    [[TMP8:%.*]] = call <vscale x 8 x i64> @llvm.stepvector.nxv8i64()
 ; CHECK-NEXT:    [[TMP7:%.*]] = mul <vscale x 8 x i64> [[TMP8]], splat (i64 1)
 ; CHECK-NEXT:    [[INDUCTION:%.*]] = add <vscale x 8 x i64> zeroinitializer, [[TMP7]]
-; CHECK-NEXT:    [[TMP12:%.*]] = mul i64 1, [[TMP6]]
+; CHECK-NEXT:    [[TMP12:%.*]] = mul i64 1, [[TMP1]]
 ; CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 8 x i64> poison, i64 [[TMP12]], i64 0
 ; CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <vscale x 8 x i64> [[DOTSPLATINSERT]], <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
@@ -117,7 +105,7 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range
 ; CHECK-NEXT:    [[TMP11:%.*]] = lshr <vscale x 8 x i64> [[BROADCAST_SPLAT]], [[TMP10]]
 ; CHECK-NEXT:    [[TMP14:%.*]] = trunc <vscale x 8 x i64> [[TMP11]] to <vscale x 8 x i8>
 ; CHECK-NEXT:    call void @llvm.masked.store.nxv8i8.p0(<vscale x 8 x i8> [[TMP14]], ptr [[NEXT_GEP]], i32 1, <vscale x 8 x i1> [[ACTIVE_LANE_MASK]])
-; CHECK-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP6]]
+; CHECK-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP1]]
 ; CHECK-NEXT:    [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 [[INDEX_NEXT]], i64 [[WIDE_TRIP_COUNT]])
 ; CHECK-NEXT:    [[VEC_IND_NEXT]] = add <vscale x 8 x i64> [[VEC_IND]], [[DOTSPLAT]]
 ; CHECK-NEXT:    br i...
[truncated]

llvmbot · 2025-08-04T09:14:06Z

@llvm/pr-subscribers-vectorizers

Author: Florian Hahn (fhahn)

Changes

Materialize the vector trip count computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. It also simplifies vector-trip count computations for scalable vectors, as we can re-use the UF x VF computation.

The underlying value of the vector trip count VPValue needs to be re-set the generated IR value for the vector trip count to keep legacy epilogue skeleton generation working.

But once the full skeleton is modeled in VPlan, we will be able to also handle the epilogue skeleton in VPlan.

Patch is 1.19 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/151925.diff

135 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+13-67)
(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+12-8)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+1-1)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+57-1)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+10-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll (+4-16)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll (+1-10)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll (+5-19)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll (+5-7)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/gather-do-not-vectorize-addressing.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll (+2-19)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/interleave-with-gaps.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/interleaving-reduction.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/low_trip_count_predicates.ll (+5-19)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll (+10-26)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll (+6-24)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/outer_loop_prefer_scalable.ll (+23-25)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll (+18-54)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-mixed.ll (+4-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product.ll (+35-111)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub.ll (+3-9)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce.ll (+11-45)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/pr151664-cost-hoisted-vector-scalable.ll (+1-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/pr60831-sve-inv-store-crash.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/pr73894.ll (-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/reduction-recurrence-costs-sve.ll (+6-24)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-reduction-inloop-cond.ll (+2-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll (+675-747)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll (+2-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/single-early-exit-interleave.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/store-costs-sve.ll (+4-20)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/struct-return-cost.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-cond-inv-loads.ll (+6-9)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-inloop-reductions.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-reductions.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-strict-reductions.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll (+16-48)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-fneg.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-gather-scatter.ll (+10-16)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll (+19-38)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll (+177-185)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-inv-store.ll (+2-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-live-out-pointer-induction.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-low-trip-count.ll (+49-20)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-multiexit.ll (+2-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-runtime-check-size-based-threshold.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll (-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll (+1-7)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-reductions.ll (+2-38)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-unroll.ll (-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll (+14-73)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-vscale-based-trip-counts.ll (+5-19)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll (+6-9)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve2-histcnt-epilogue.ll (+2-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve2-histcnt-too-many-deps.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve2-histcnt.ll (+20-30)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/tail-folding-styles.ll (+5-23)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/vplan-printing.ll (+6-5)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/wider-VF-for-callinst.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll (+2-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/blend-any-of-reduction-cost.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/blocks-with-dead-instructions.ll (+16-34)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/dead-ops-cost.ll (+8-16)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/defaults.ll (+2-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll (+9-27)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll (-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/f16.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/first-order-recurrence-scalable-vf1.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/fminimumnum.ll (+12-36)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/induction-costs.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll (+6-38)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll (+22-58)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-masked-access.ll (+112-120)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/lmul.ll (+4-11)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/low-trip-count.ll (-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/mask-index-type.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/masked_gather_scatter.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/partial-reduce-dot-product.ll (+8-24)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/pr87378-vpinstruction-or-drop-poison-generating-flags.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/pr88802.ll (-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/reductions.ll (+16-48)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/remark-reductions.ll (+2-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (+58-82)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/safe-dep-distance.ll (+3-9)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/scalable-basics.ll (+6-18)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/scalable-tailfold.ll (+1-39)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll (+6-18)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll (+15-33)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-bin-unary-ops-args.ll (+18-162)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-call-intrinsics.ll (+9-81)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-cast-intrinsics.ll (+12-100)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-cond-reduction.ll (+12-76)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-div.ll (+4-36)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-fixed-order-recurrence.ll (+15-51)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-gather-scatter.ll (+3-11)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-inloop-reduction.ll (+28-140)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-interleave.ll (+2-10)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-intermediate-store.ll (+2-18)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-iv32.ll (+2-10)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-known-no-overflow.ll (-18)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-masked-loadstore.ll (+2-10)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-ordered-reduction.ll (+2-10)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reduction.ll (+28-140)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reverse-load-store.ll (+10-32)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-safe-dep-distance.ll (+2-23)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-uniform-store.ll (-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-cost.ll (+3-15)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll (-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll (-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+11-71)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vector-loop-backedge-elimination-with-evl.ll (-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll (+2-10)
(modified) llvm/test/Transforms/LoopVectorize/X86/scev-checks-unprofitable.ll (-3)
(modified) llvm/test/Transforms/LoopVectorize/dbg-outer-loop-vect.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-scalable-vf1.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/outer_loop_scalable.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/runtime-check.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll (+14-38)
(modified) llvm/test/Transforms/LoopVectorize/scalable-inductions.ll (+9-17)
(modified) llvm/test/Transforms/LoopVectorize/scalable-iv-outside-user.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/scalable-lifetime.ll (+13-17)
(modified) llvm/test/Transforms/LoopVectorize/scalable-loop-unpredicated-body-scalar-tail.ll (-4)
(modified) llvm/test/Transforms/LoopVectorize/scalable-predication.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/scalable-reduction-inloop.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/scalable-trunc-min-bitwidth.ll (+2-6)
(modified) llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination.ll (-18)
(modified) llvm/test/Transforms/LoopVectorize/vectorize-force-tail-with-evl.ll (+2-5)
(modified) llvm/test/Transforms/LoopVectorize/vplan-iv-transforms.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/vplan-predicate-switch.ll (+5-6)
(modified) llvm/test/Transforms/LoopVectorize/vplan-printing-before-execute.ll (+7-7)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index d04317bd8822d..1175c9f277cc0 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -548,9 +548,6 @@ class InnerLoopVectorizer {
 protected:
   friend class LoopVectorizationPlanner;
 
-  /// Returns (and creates if needed) the trip count of the widened loop.
-  Value *getOrCreateVectorTripCount(BasicBlock *InsertBlock);
-
   // Create a check to see if the vector loop should be executed
   Value *createIterationCountCheck(ElementCount VF, unsigned UF) const;
 
@@ -602,6 +599,7 @@ class InnerLoopVectorizer {
   /// vector elements.
   ElementCount VF;
 
+public:
   ElementCount MinProfitableTripCount;
 
   /// The vectorization unroll factor to use. Each scalar is vectorized to this
@@ -2279,56 +2277,6 @@ static bool useMaskedInterleavedAccesses(const TargetTransformInfo &TTI) {
   return TTI.enableMaskedInterleavedAccessVectorization();
 }
 
-Value *
-InnerLoopVectorizer::getOrCreateVectorTripCount(BasicBlock *InsertBlock) {
-  if (VectorTripCount)
-    return VectorTripCount;
-
-  Value *TC = getTripCount();
-  IRBuilder<> Builder(InsertBlock->getTerminator());
-
-  Type *Ty = TC->getType();
-  // This is where we can make the step a runtime constant.
-  Value *Step = createStepForVF(Builder, Ty, VF, UF);
-
-  // If the tail is to be folded by masking, round the number of iterations N
-  // up to a multiple of Step instead of rounding down. This is done by first
-  // adding Step-1 and then rounding down. Note that it's ok if this addition
-  // overflows: the vector induction variable will eventually wrap to zero given
-  // that it starts at zero and its Step is a power of two; the loop will then
-  // exit, with the last early-exit vector comparison also producing all-true.
-  // For scalable vectors the VF is not guaranteed to be a power of 2, but this
-  // is accounted for in emitIterationCountCheck that adds an overflow check.
-  if (Cost->foldTailByMasking()) {
-    assert(isPowerOf2_32(VF.getKnownMinValue() * UF) &&
-           "VF*UF must be a power of 2 when folding tail by masking");
-    TC = Builder.CreateAdd(TC, Builder.CreateSub(Step, ConstantInt::get(Ty, 1)),
-                           "n.rnd.up");
-  }
-
-  // Now we need to generate the expression for the part of the loop that the
-  // vectorized body will execute. This is equal to N - (N % Step) if scalar
-  // iterations are not required for correctness, or N - Step, otherwise. Step
-  // is equal to the vectorization factor (number of SIMD elements) times the
-  // unroll factor (number of SIMD instructions).
-  Value *R = Builder.CreateURem(TC, Step, "n.mod.vf");
-
-  // There are cases where we *must* run at least one iteration in the remainder
-  // loop.  See the cost model for when this can happen.  If the step evenly
-  // divides the trip count, we set the remainder to be equal to the step. If
-  // the step does not evenly divide the trip count, no adjustment is necessary
-  // since there will already be scalar iterations. Note that the minimum
-  // iterations check ensures that N >= Step.
-  if (Cost->requiresScalarEpilogue(VF.isVector())) {
-    auto *IsZero = Builder.CreateICmpEQ(R, ConstantInt::get(R->getType(), 0));
-    R = Builder.CreateSelect(IsZero, Step, R);
-  }
-
-  VectorTripCount = Builder.CreateSub(TC, R, "n.vec");
-
-  return VectorTripCount;
-}
-
 void InnerLoopVectorizer::introduceCheckBlockInVPlan(BasicBlock *CheckIRBB) {
   // Note: The block with the minimum trip-count check is already connected
   // during earlier VPlan construction.
@@ -7319,6 +7267,9 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   // Canonicalize EVL loops after regions are dissolved.
   VPlanTransforms::canonicalizeEVLLoops(BestVPlan);
   VPlanTransforms::materializeBackedgeTakenCount(BestVPlan, VectorPH);
+  VPlanTransforms::materializeVectorTripCount(
+      BestVPlan, VectorPH, CM.foldTailByMasking(),
+      CM.requiresScalarEpilogue(BestVF.isVector()));
 
   // Perform the actual loop transformation.
   VPTransformState State(&TTI, BestVF, LI, DT, ILV.AC, ILV.Builder, &BestVPlan,
@@ -7375,8 +7326,7 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   //===------------------------------------------------===//
 
   // 2. Copy and widen instructions from the old loop into the new loop.
-  BestVPlan.prepareToExecute(
-      ILV.getOrCreateVectorTripCount(ILV.LoopVectorPreHeader), State);
+  BestVPlan.prepareToExecute(State);
   replaceVPBBWithIRVPBB(VectorPH, State.CFG.PrevBB);
 
   // Move check blocks to their final position.
@@ -7496,7 +7446,6 @@ BasicBlock *EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton() {
       emitIterationCountCheck(LoopScalarPreHeader, false);
 
   // Generate the induction variable.
-  EPI.VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);
 
   replaceVPBBWithIRVPBB(Plan.getScalarPreheader(), LoopScalarPreHeader);
   return LoopVectorPreHeader;
@@ -9376,13 +9325,6 @@ void VPDerivedIVRecipe::execute(VPTransformState &State) {
       State.Builder, Index, getStartValue()->getLiveInIRValue(), Step, Kind,
       cast_if_present<BinaryOperator>(FPBinOp));
   DerivedIV->setName(Name);
-  // If index is the vector trip count, the concrete value will only be set in
-  // prepareToExecute, leading to missed simplifications, e.g. if it is 0.
-  // TODO: Remove the special case for the vector trip count once it is computed
-  // in VPlan and can be used during VPlan simplification.
-  assert((DerivedIV != Index ||
-          getOperand(1) == &getParent()->getPlan()->getVectorTripCount()) &&
-         "IV didn't need transforming?");
   State.set(this, DerivedIV, VPLane(0));
 }
 
@@ -10272,8 +10214,9 @@ bool LoopVectorizePass::processLoop(Loop *L) {
 
       // TODO: Move to general VPlan pipeline once epilogue loops are also
       // supported.
-      VPlanTransforms::runPass(VPlanTransforms::materializeVectorTripCount,
-                               BestPlan, VF.Width, IC, PSE);
+      VPlanTransforms::runPass(
+          VPlanTransforms::materializeConstantVectorTripCount, BestPlan,
+          VF.Width, IC, PSE);
 
       LVP.executePlan(VF.Width, IC, BestPlan, Unroller, DT, false);
 
@@ -10312,6 +10255,8 @@ bool LoopVectorizePass::processLoop(Loop *L) {
         // edges from the first pass.
         EPI.MainLoopVF = EPI.EpilogueVF;
         EPI.MainLoopUF = EPI.EpilogueUF;
+        EPI.VectorTripCount =
+            BestMainPlan->getVectorTripCount().getLiveInIRValue();
         EpilogueVectorizerEpilogueLoop EpilogILV(L, PSE, LI, DT, TLI, TTI, AC,
                                                  ORE, EPI, &CM, BFI, PSI,
                                                  Checks, BestEpiPlan);
@@ -10344,8 +10289,9 @@ bool LoopVectorizePass::processLoop(Loop *L) {
                                Checks, BestPlan);
         // TODO: Move to general VPlan pipeline once epilogue loops are also
         // supported.
-        VPlanTransforms::runPass(VPlanTransforms::materializeVectorTripCount,
-                                 BestPlan, VF.Width, IC, PSE);
+        VPlanTransforms::runPass(
+            VPlanTransforms::materializeConstantVectorTripCount, BestPlan,
+            VF.Width, IC, PSE);
 
         LVP.executePlan(VF.Width, IC, BestPlan, LB, DT, false);
         ++LoopsVectorized;
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 2138b4154d2c2..4038a120d6bfe 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -951,15 +951,9 @@ VPlan::~VPlan() {
     delete BackedgeTakenCount;
 }
 
-void VPlan::prepareToExecute(Value *VectorTripCountV, VPTransformState &State) {
-  if (!VectorTripCount.getUnderlyingValue())
-    VectorTripCount.setUnderlyingValue(VectorTripCountV);
-  else
-    assert(VectorTripCount.getUnderlyingValue() == VectorTripCountV &&
-           "VectorTripCount set earlier must much VectorTripCountV");
-
+void VPlan::prepareToExecute(VPTransformState &State) {
   IRBuilder<> Builder(State.CFG.PrevBB->getTerminator());
-  Type *TCTy = VectorTripCountV->getType();
+  Type *TCTy = VPTypeAnalysis(*this).inferScalarType(getTripCount());
   // FIXME: Model VF * UF computation completely in VPlan.
   unsigned UF = getUF();
   if (VF.getNumUsers()) {
@@ -1023,6 +1017,16 @@ void VPlan::execute(VPTransformState *State) {
     Block->execute(State);
 
   State->CFG.DTU.flush();
+  auto *ScalarPH = getScalarPreheader();
+
+  // Set the underlying value of the vector trip count VPValue to the generated
+  // vector trip count by accessing the incoming value from the resume phi.
+  // TODO: This is needed, as the epilogue skeleton generation code needs to
+  // access the IR value. Remove once skeleton is modeled completely in VPlan.
+  if (!getVectorTripCount().getLiveInIRValue() &&
+      ScalarPH->getNumPredecessors() > 0)
+    getVectorTripCount().setUnderlyingValue(
+        State->get(cast<VPPhi>(&*ScalarPH->begin())->getOperand(0), true));
 
   VPBasicBlock *Header = vputils::getFirstLoopHeader(*this, State->VPDT);
   if (!Header)
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 6f547a31f4b9f..3084d50683644 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -3958,7 +3958,7 @@ class VPlan {
   }
 
   /// Prepare the plan for execution, setting up the required live-in values.
-  void prepareToExecute(Value *VectorTripCount, VPTransformState &State);
+  void prepareToExecute(VPTransformState &State);
 
   /// Generate the IR code for this VPlan.
   void execute(VPTransformState *State);
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 3ecffc7593d49..fb3300d94ff1f 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -3153,7 +3153,7 @@ void VPlanTransforms::materializeBroadcasts(VPlan &Plan) {
   }
 }
 
-void VPlanTransforms::materializeVectorTripCount(
+void VPlanTransforms::materializeConstantVectorTripCount(
     VPlan &Plan, ElementCount BestVF, unsigned BestUF,
     PredicatedScalarEvolution &PSE) {
   assert(Plan.hasVF(BestVF) && "BestVF is not available in Plan");
@@ -3191,6 +3191,62 @@ void VPlanTransforms::materializeBackedgeTakenCount(VPlan &Plan,
   BTC->replaceAllUsesWith(TCMO);
 }
 
+void VPlanTransforms::materializeVectorTripCount(VPlan &Plan,
+                                                 VPBasicBlock *VectorPHVPBB,
+                                                 bool TailByMasking,
+                                                 bool RequiresScalarEpilogue) {
+  VPValue &VectorTC = Plan.getVectorTripCount();
+  if (VectorTC.getNumUsers() == 0 ||
+      (VectorTC.isLiveIn() && VectorTC.getLiveInIRValue()))
+    return;
+  VPValue *TC = Plan.getTripCount();
+  Type *TCTy = VPTypeAnalysis(Plan).inferScalarType(TC);
+  VPBuilder Builder(VectorPHVPBB, VectorPHVPBB->begin());
+
+  VPValue *Step = &Plan.getVFxUF();
+
+  // If the tail is to be folded by masking, round the number of iterations N
+  // up to a multiple of Step instead of rounding down. This is done by first
+  // adding Step-1 and then rounding down. Note that it's ok if this addition
+  // overflows: the vector induction variable will eventually wrap to zero given
+  // that it starts at zero and its Step is a power of two; the loop will then
+  // exit, with the last early-exit vector comparison also producing all-true.
+  // For scalable vectors the VF is not guaranteed to be a power of 2, but this
+  // is accounted for in emitIterationCountCheck that adds an overflow check.
+  if (TailByMasking) {
+    TC = Builder.createNaryOp(
+        Instruction::Add,
+        {TC, Builder.createNaryOp(
+                 Instruction::Sub,
+                 {Step, Plan.getOrAddLiveIn(ConstantInt::get(TCTy, 1))})},
+        DebugLoc::getUnknown(), "n.rnd.up");
+  }
+
+  // Now we need to generate the expression for the part of the loop that the
+  // vectorized body will execute. This is equal to N - (N % Step) if scalar
+  // iterations are not required for correctness, or N - Step, otherwise. Step
+  // is equal to the vectorization factor (number of SIMD elements) times the
+  // unroll factor (number of SIMD instructions).
+  VPValue *R = Builder.createNaryOp(Instruction::URem, {TC, Step},
+                                    DebugLoc::getUnknown(), "n.mod.vf");
+
+  // There are cases where we *must* run at least one iteration in the remainder
+  // loop.  See the cost model for when this can happen.  If the step evenly
+  // divides the trip count, we set the remainder to be equal to the step. If
+  // the step does not evenly divide the trip count, no adjustment is necessary
+  // since there will already be scalar iterations. Note that the minimum
+  // iterations check ensures that N >= Step.
+  if (RequiresScalarEpilogue) {
+    auto *IsZero = Builder.createICmp(
+        CmpInst::ICMP_EQ, R, Plan.getOrAddLiveIn(ConstantInt::get(TCTy, 0)));
+    R = Builder.createSelect(IsZero, Step, R);
+  }
+
+  auto Res = Builder.createNaryOp(Instruction::Sub, {TC, R},
+                                  DebugLoc::getUnknown(), "n.vec");
+  Plan.getVectorTripCount().replaceAllUsesWith(Res);
+}
+
 /// Returns true if \p V is VPWidenLoadRecipe or VPInterleaveRecipe that can be
 /// converted to a narrower recipe. \p V is used by a wide recipe that feeds a
 /// store interleave group at index \p Idx, \p WideMember0 is the recipe feeding
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index 5943684e17a76..2a33fc8a6621a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -252,9 +252,16 @@ struct VPlanTransforms {
 
   // Materialize vector trip counts for constants early if it can simply be
   // computed as (Original TC / VF * UF) * VF * UF.
-  static void materializeVectorTripCount(VPlan &Plan, ElementCount BestVF,
-                                         unsigned BestUF,
-                                         PredicatedScalarEvolution &PSE);
+  static void
+  materializeConstantVectorTripCount(VPlan &Plan, ElementCount BestVF,
+                                     unsigned BestUF,
+                                     PredicatedScalarEvolution &PSE);
+
+  // Materialize vector trip count computations to a set of VPInstructions.
+  static void materializeVectorTripCount(VPlan &Plan,
+                                         VPBasicBlock *VectorPHVPBB,
+                                         bool TailByMasking,
+                                         bool RequiresScalarEpilogue);
 
   /// Materialize the backedge-taken count to be computed explicitly using
   /// VPInstructions.
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll b/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
index 795de3d978e74..04ff5fe37a6df 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
@@ -9,19 +9,13 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
 ; CHECK:       vector.ph:
 ; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP1:%.*]] = mul nuw i64 [[TMP0]], 8
-; CHECK-NEXT:    [[TMP4:%.*]] = sub i64 [[TMP1]], 1
-; CHECK-NEXT:    [[N_RND_UP:%.*]] = add i64 8, [[TMP4]]
-; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
-; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
-; CHECK-NEXT:    [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT:    [[TMP6:%.*]] = mul nuw i64 [[TMP5]], 8
 ; CHECK-NEXT:    [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 0, i64 8)
 ; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 8 x i64> poison, i64 [[VAL]], i64 0
 ; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 8 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer
 ; CHECK-NEXT:    [[TMP8:%.*]] = call <vscale x 8 x i64> @llvm.stepvector.nxv8i64()
 ; CHECK-NEXT:    [[TMP7:%.*]] = mul <vscale x 8 x i64> [[TMP8]], splat (i64 1)
 ; CHECK-NEXT:    [[INDUCTION:%.*]] = add <vscale x 8 x i64> zeroinitializer, [[TMP7]]
-; CHECK-NEXT:    [[TMP12:%.*]] = mul i64 1, [[TMP6]]
+; CHECK-NEXT:    [[TMP12:%.*]] = mul i64 1, [[TMP1]]
 ; CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 8 x i64> poison, i64 [[TMP12]], i64 0
 ; CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <vscale x 8 x i64> [[DOTSPLATINSERT]], <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
@@ -34,7 +28,7 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
 ; CHECK-NEXT:    [[TMP11:%.*]] = lshr <vscale x 8 x i64> [[BROADCAST_SPLAT]], [[TMP10]]
 ; CHECK-NEXT:    [[TMP14:%.*]] = trunc <vscale x 8 x i64> [[TMP11]] to <vscale x 8 x i8>
 ; CHECK-NEXT:    call void @llvm.masked.store.nxv8i8.p0(<vscale x 8 x i8> [[TMP14]], ptr [[NEXT_GEP]], i32 1, <vscale x 8 x i1> [[ACTIVE_LANE_MASK]])
-; CHECK-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP6]]
+; CHECK-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP1]]
 ; CHECK-NEXT:    [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 [[INDEX_NEXT]], i64 8)
 ; CHECK-NEXT:    [[VEC_IND_NEXT]] = add <vscale x 8 x i64> [[VEC_IND]], [[DOTSPLAT]]
 ; CHECK-NEXT:    br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
@@ -92,19 +86,13 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range
 ; CHECK:       vector.ph:
 ; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP1:%.*]] = mul nuw i64 [[TMP0]], 8
-; CHECK-NEXT:    [[TMP4:%.*]] = sub i64 [[TMP1]], 1
-; CHECK-NEXT:    [[N_RND_UP:%.*]] = add i64 [[WIDE_TRIP_COUNT]], [[TMP4]]
-; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
-; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
-; CHECK-NEXT:    [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT:    [[TMP6:%.*]] = mul nuw i64 [[TMP5]], 8
 ; CHECK-NEXT:    [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 0, i64 [[WIDE_TRIP_COUNT]])
 ; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 8 x i64> poison, i64 [[VAL]], i64 0
 ; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 8 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer
 ; CHECK-NEXT:    [[TMP8:%.*]] = call <vscale x 8 x i64> @llvm.stepvector.nxv8i64()
 ; CHECK-NEXT:    [[TMP7:%.*]] = mul <vscale x 8 x i64> [[TMP8]], splat (i64 1)
 ; CHECK-NEXT:    [[INDUCTION:%.*]] = add <vscale x 8 x i64> zeroinitializer, [[TMP7]]
-; CHECK-NEXT:    [[TMP12:%.*]] = mul i64 1, [[TMP6]]
+; CHECK-NEXT:    [[TMP12:%.*]] = mul i64 1, [[TMP1]]
 ; CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 8 x i64> poison, i64 [[TMP12]], i64 0
 ; CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <vscale x 8 x i64> [[DOTSPLATINSERT]], <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
@@ -117,7 +105,7 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range
 ; CHECK-NEXT:    [[TMP11:%.*]] = lshr <vscale x 8 x i64> [[BROADCAST_SPLAT]], [[TMP10]]
 ; CHECK-NEXT:    [[TMP14:%.*]] = trunc <vscale x 8 x i64> [[TMP11]] to <vscale x 8 x i8>
 ; CHECK-NEXT:    call void @llvm.masked.store.nxv8i8.p0(<vscale x 8 x i8> [[TMP14]], ptr [[NEXT_GEP]], i32 1, <vscale x 8 x i1> [[ACTIVE_LANE_MASK]])
-; CHECK-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP6]]
+; CHECK-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP1]]
 ; CHECK-NEXT:    [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 [[INDEX_NEXT]], i64 [[WIDE_TRIP_COUNT]])
 ; CHECK-NEXT:    [[VEC_IND_NEXT]] = add <vscale x 8 x i64> [[VEC_IND]], [[DOTSPLAT]]
 ; CHECK-NEXT:    br i...
[truncated]

Materialize the vector trip count computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. It also simplifies vector-trip count computations for scalable vectors, as we can re-use the UF x VF computation. The underlying value of the vector trip count VPValue needs to be re-set the generated IR value for the vector trip count to keep legacy epilogue skeleton generation working. But once the full skeleton is modeled in VPlan, we will be able to also handle the epilogue skeleton in VPlan.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

david-arm · 2025-08-07T09:20:07Z

llvm/test/Transforms/LoopVectorize/AArch64/pr151664-cost-hoisted-vector-scalable.ll

@@ -7,7 +7,6 @@
 ; In this example, `<vscale x 4 x float> @llvm.minimumnum` has an invalid cost,
 ; and hence should not be produced by LoopVectorize.

-; CHECK: LV: Found an estimated cost of Invalid for VF vscale x 4 For instruction:   %res = tail call float @llvm.minimumnum.f32(float %arg, float 0.000000e+00)


Why has this been removed? It looks like a critical part of the test. It also means the REQUIRES: asserts line near the top is redundant and we can remove the -debug-only RUN line argument.

The check here wasn't really relevant (it checks the legacy cost model). The issue has been fixed by #151940, the debug checks are now gone.

david-arm · 2025-08-07T09:24:57Z

llvm/test/Transforms/LoopVectorize/AArch64/sve-low-trip-count.ll

@@ -1,24 +1,52 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 5


Perhaps worth pre-committing a change to use the autogenerate script first?

Ah yes, I missed that one. Done in 47944d0

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

lukel97 · 2025-08-07T09:49:26Z

llvm/test/Transforms/LoopVectorize/RISCV/interleaved-masked-access.ll

 ; SCALAR_EPILOGUE-NEXT:    [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 16 x i32> [[BROADCAST_SPLATINSERT1]], <vscale x 16 x i32> poison, <vscale x 16 x i32> zeroinitializer
 ; SCALAR_EPILOGUE-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; SCALAR_EPILOGUE:       vector.body:
 ; SCALAR_EPILOGUE-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
-; SCALAR_EPILOGUE-NEXT:    [[VEC_IND:%.*]] = phi <vscale x 16 x i32> [ [[TMP5]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ]


I think these lines are accidentally getting stripped. Do we need to drop/fix the --filter-out-after=^scalar.ph arg?

Ah yes, I missed updating that one. Should be fixed now

Auto-generate checks for #151925. Also update some naming to make more consistent with other tests.

Auto-generate checks for llvm/llvm-project#151925. Also update some naming to make more consistent with other tests.

lukel97

Went through the RISC-V diffs, LGTM

david-arm · 2025-08-07T10:50:33Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  if (RequiresScalarEpilogue) {
+    VPValue *IsZero = Builder.createICmp(
+        CmpInst::ICMP_EQ, R, Plan.getOrAddLiveIn(ConstantInt::get(TCTy, 0)));
+    R = Builder.createSelect(IsZero, Step, R);


This isn't a problem with your patch, but isn't there some odd behaviour here when tail-folding a loop that requires a scalar epilogue? Suppose TC=(VF * UF) + 1, for the tail-folding case we add (VF * UF) - 1, which means that the remainder is 0. If requiring a scalar epilogue, we then set the remainder to be TC - (VF * VF), i.e. 1. Performance won't be great. :)

Perhaps we don't permit tail-folding when requiring a scalar epilogue?

Yes they are mutually exclusive at the moment. Added an assert though, in case it changes in the future

llvm-ci · 2025-08-08T10:58:18Z

LLVM Buildbot has detected a new failure on builder lldb-aarch64-ubuntu running on linaro-lldb-aarch64-ubuntu while building llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/22311

Here is the relevant piece of the build log for the reference

Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: python_api/process/cancel_attach/TestCancelAttach.py (1183 of 2304)
PASS: lldb-api :: tools/lldb-dap/breakpoint/TestDAP_logpoints.py (1184 of 2304)
PASS: lldb-api :: tools/lldb-dap/breakpoint/TestDAP_setExceptionBreakpoints.py (1185 of 2304)
PASS: lldb-api :: python_api/watchpoint/watchlocation/TestTargetWatchAddress.py (1186 of 2304)
PASS: lldb-api :: tools/lldb-dap/breakpoint/TestDAP_setFunctionBreakpoints.py (1187 of 2304)
PASS: lldb-api :: tools/lldb-dap/breakpoint/TestDAP_setBreakpoints.py (1188 of 2304)
PASS: lldb-api :: tools/lldb-dap/commands/TestDAP_commands.py (1189 of 2304)
PASS: lldb-api :: tools/lldb-dap/console/TestDAP_redirection_to_console.py (1190 of 2304)
PASS: lldb-api :: tools/lldb-dap/completions/TestDAP_completions.py (1191 of 2304)
UNRESOLVED: lldb-api :: functionalities/statusline/TestStatusline.py (1192 of 2304)
******************** TEST 'lldb-api :: functionalities/statusline/TestStatusline.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --arch aarch64 --build-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --cmake-build-type Release /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/functionalities/statusline -p TestStatusline.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 22.0.0git (https://github.com/llvm/llvm-project.git revision 82d633e9ff51962eb21fb58d38b74b4e8829a199)
  clang revision 82d633e9ff51962eb21fb58d38b74b4e8829a199
  llvm revision 82d633e9ff51962eb21fb58d38b74b4e8829a199
Skipping the following test categories: ['libc++', 'msvcstl', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test (TestStatusline.TestStatusline)
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_deadlock (TestStatusline.TestStatusline)
lldb-server exiting...
FAIL: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_modulelist_deadlock (TestStatusline.TestStatusline)
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_no_color (TestStatusline.TestStatusline)
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_no_target (TestStatusline.TestStatusline)
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_resize (TestStatusline.TestStatusline)
======================================================================
ERROR: test_modulelist_deadlock (TestStatusline.TestStatusline)
   Regression test for a deadlock that occurs when the status line is enabled before connecting
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/functionalities/statusline/TestStatusline.py", line 199, in test_modulelist_deadlock
    self.expect(
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/lldbpexpect.py", line 95, in expect
    self.expect_prompt()
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/lldbpexpect.py", line 19, in expect_prompt
    self.child.expect_exact(self.PROMPT)
  File "/usr/local/lib/python3.10/dist-packages/pexpect/spawnbase.py", line 432, in expect_exact
    return exp.expect_loop(timeout)
  File "/usr/local/lib/python3.10/dist-packages/pexpect/expect.py", line 181, in expect_loop
    return self.timeout(e)

llvm-ci · 2025-08-08T11:09:58Z

LLVM Buildbot has detected a new failure on builder clang-aarch64-quick running on linaro-clang-aarch64-quick while building llvm at step 5 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/65/builds/20952

Here is the relevant piece of the build log for the reference

Step 5 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'Clangd Unit Tests :: ./ClangdTests/114/330' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/tools/clang/tools/extra/clangd/unittests/./ClangdTests-Clangd Unit Tests-3734693-114-330.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=330 GTEST_SHARD_INDEX=114 /home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/tools/clang/tools/extra/clangd/unittests/./ClangdTests
--

Note: This is test shard 115 of 330.
[==========] Running 4 tests from 4 test suites.
[----------] Global test environment set-up.
[----------] 1 test from CompletionTest
[ RUN      ] CompletionTest.CommentsOnMembersFromHeader
ASTWorker building file /clangd-test/foo.cpp version null with command 
[/clangd-test]
clang -ffreestanding /clangd-test/foo.cpp
Driver produced command: cc1 -cc1 -triple aarch64-unknown-linux-gnu -fsyntax-only -disable-free -clear-ast-before-backend -main-file-name foo.cpp -mrelocation-model pic -pic-level 2 -pic-is-pie -mframe-pointer=non-leaf -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -ffreestanding -enable-tlsdesc -target-cpu generic -target-feature +v8a -target-feature +fp-armv8 -target-feature +neon -target-abi aapcs -debugger-tuning=gdb -fdebug-compilation-dir=/clangd-test -fcoverage-compilation-dir=/clangd-test -resource-dir lib/clang/22 -internal-isystem lib/clang/22/include -internal-isystem /usr/local/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -fdeprecated-macro -ferror-limit 19 -fno-signed-char -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fcxx-exceptions -fexceptions -no-round-trip-args -target-feature -fmv -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -x c++ /clangd-test/foo.cpp
Building first preamble for /clangd-test/foo.cpp version null
not idle after addDocument
UNREACHABLE executed at ../llvm/clang-tools-extra/clangd/unittests/SyncAPI.cpp:22!
 #0 0x0000b496d18ff62c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/tools/clang/tools/extra/clangd/unittests/./ClangdTests+0xc4f62c)
 #1 0x0000b496d18fd0f4 llvm::sys::RunSignalHandlers() (/home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/tools/clang/tools/extra/clangd/unittests/./ClangdTests+0xc4d0f4)
 #2 0x0000b496d1900488 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x0000eacd537b78f8 (linux-vdso.so.1+0x8f8)
 #4 0x0000eacd5331f1f0 __pthread_kill_implementation ./nptl/./nptl/pthread_kill.c:44:76
 #5 0x0000eacd532da67c gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #6 0x0000eacd532c7130 abort ./stdlib/./stdlib/abort.c:81:7
 #7 0x0000b496d18abeb4 llvm::RTTIRoot::anchor() (/home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/tools/clang/tools/extra/clangd/unittests/./ClangdTests+0xbfbeb4)
 #8 0x0000b496d175804c clang::clangd::runCodeComplete(clang::clangd::ClangdServer&, llvm::StringRef, clang::clangd::Position, clang::clangd::CodeCompleteOptions) (/home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/tools/clang/tools/extra/clangd/unittests/./ClangdTests+0xaa804c)
 #9 0x0000b496d12e6190 clang::clangd::(anonymous namespace)::CompletionTest_CommentsOnMembersFromHeader_Test::TestBody() CodeCompleteTests.cpp:0:0
#10 0x0000b496d1956908 testing::Test::Run() (/home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/tools/clang/tools/extra/clangd/unittests/./ClangdTests+0xca6908)
#11 0x0000b496d1957c2c testing::TestInfo::Run() (/home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/tools/clang/tools/extra/clangd/unittests/./ClangdTests+0xca7c2c)
#12 0x0000b496d1958868 testing::TestSuite::Run() (/home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/tools/clang/tools/extra/clangd/unittests/./ClangdTests+0xca8868)
#13 0x0000b496d1968be8 testing::internal::UnitTestImpl::RunAllTests() (/home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/tools/clang/tools/extra/clangd/unittests/./ClangdTests+0xcb8be8)
#14 0x0000b496d1968534 testing::UnitTest::Run() (/home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/tools/clang/tools/extra/clangd/unittests/./ClangdTests+0xcb8534)
#15 0x0000b496d1943650 main (/home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/tools/clang/tools/extra/clangd/unittests/./ClangdTests+0xc93650)
#16 0x0000eacd532c73fc __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#17 0x0000eacd532c74cc call_init ./csu/../csu/libc-start.c:128:20
#18 0x0000eacd532c74cc __libc_start_main ./csu/../csu/libc-start.c:379:5
#19 0x0000b496d110e8f0 _start (/home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/tools/clang/tools/extra/clangd/unittests/./ClangdTests+0x45e8f0)

--
exit: -6
--
shard JSON output does not exist: /home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/tools/clang/tools/extra/clangd/unittests/./ClangdTests-Clangd Unit Tests-3734693-114-330.json
********************

…. (#151925) Materialize the vector trip count computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. It also simplifies vector-trip count computations for scalable vectors, as we can re-use the UF x VF computation. PR: llvm/llvm-project#151925

fhahn requested review from preames, lukel97, ayalz, aniragil and david-arm August 4, 2025 09:13

llvmbot added backend:RISC-V vectorizers llvm:transforms labels Aug 4, 2025

fhahn force-pushed the vplan-vector-trip-count branch 2 times, most recently from ac36655 to 486d6a6 Compare August 4, 2025 11:22

fhahn added 3 commits August 4, 2025 13:54

!fixup update missed test

db2fc58

!fixup add missing doc-comment

4a8927a

fhahn force-pushed the vplan-vector-trip-count branch from 486d6a6 to 4a8927a Compare August 4, 2025 12:54

lukel97 reviewed Aug 4, 2025

View reviewed changes

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp Outdated Show resolved Hide resolved

fhahn mentioned this pull request Aug 6, 2025

[VPlan] Materialize constant vector trip counts before final opts. #142309

Merged

fhahn added 2 commits August 7, 2025 09:32

Merge remote-tracking branch 'origin/main' into vplan-vector-trip-count

f9c873f

!fixup addres comments, thanks

e190397

david-arm reviewed Aug 7, 2025

View reviewed changes

lukel97 reviewed Aug 7, 2025

View reviewed changes

fhahn added a commit that referenced this pull request Aug 7, 2025

[LV] Auto-generate checks for sve-low-trip-count.ll.

47944d0

Auto-generate checks for #151925. Also update some naming to make more consistent with other tests.

fhahn added 2 commits August 7, 2025 11:01

Merge remote-tracking branch 'origin/main' into vplan-vector-trip-count

96b818a

!fixup address comments, thanks

3ced2d8

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 7, 2025

Automerge: [LV] Auto-generate checks for sve-low-trip-count.ll.

884fd53

Auto-generate checks for llvm/llvm-project#151925. Also update some naming to make more consistent with other tests.

lukel97 approved these changes Aug 7, 2025

View reviewed changes

david-arm reviewed Aug 7, 2025

View reviewed changes

fhahn added 2 commits August 8, 2025 09:43

Merge remote-tracking branch 'origin/main' into vplan-vector-trip-count

3a9e56a

!fixup update tests after update to latest main, add assert

319ff1a

Merge remote-tracking branch 'origin/main' into vplan-vector-trip-count

0db43c0

fhahn merged commit 82d633e into llvm:main Aug 8, 2025
9 checks passed

fhahn deleted the vplan-vector-trip-count branch August 11, 2025 09:19

		@@ -1,24 +1,52 @@
		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 5

[VPlan] Materialize vector trip count using VPInstructions. #151925

[VPlan] Materialize vector trip count using VPInstructions. #151925

Uh oh!

Conversation

fhahn commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

david-arm Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

david-arm Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lukel97 Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fhahn Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

lukel97 left a comment

Choose a reason for hiding this comment

Uh oh!

david-arm Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Aug 8, 2025

Uh oh!

llvm-ci commented Aug 8, 2025

Uh oh!

Uh oh!

fhahn commented Aug 4, 2025 •

edited

Loading

llvmbot commented Aug 4, 2025 •

edited

Loading

lukel97 Aug 7, 2025 •

edited

Loading